<<<<<<< HEAD Evaluating Winner’s Curse methods

In this document, we intend to investigate the following key questions, assuming a fixed array of \(10^6\) SNPs and a quantitative trait in which SNP effect sizes follow a normal distribution:

1. When is Winner’s Curse a problem?

In this section, we look at the average number of significant SNPs, the average proportion of these significant SNPs that have association estimates more extreme than their true effect size and the average MSE of significant SNPs at two different thresholds; the common genome-wide significance threshold of \(5 \times 10^{-8}\) and a higher threshold of \(5 \times 10^{-4}\). We consider these properties under certain combinations of values for the following parameters:

  1. sample size - n_samples
  2. heritability - h2
  3. polygenicity, i.e. proportion of effect SNPs - prop_effect
  4. selection coefficient - S

The 24 different combinations that we will investigate throughout this document are detailed below:

Scenario n_samples h2 prop_effect S
1 30,000 0.3 0.010 -1
2 300,000 0.3 0.010 -1
3 30,000 0.8 0.010 -1
4 300,000 0.8 0.010 -1
5 30,000 0.3 0.001 -1
6 300,000 0.3 0.001 -1
7 30,000 0.8 0.001 -1
8 300,000 0.8 0.001 -1
9 30,000 0.3 0.010 0
10 300,000 0.3 0.010 0
11 30,000 0.8 0.010 0
12 300,000 0.8 0.010 0
Scenario n_samples h2 prop_effect S
13 30,000 0.3 0.001 0
14 300,000 0.3 0.001 0
15 30,000 0.8 0.001 0
16 300,000 0.8 0.001 0
17 30,000 0.3 0.010 1
18 300,000 0.3 0.010 1
19 30,000 0.8 0.010 1
20 300,000 0.8 0.010 1
21 30,000 0.3 0.001 1
22 300,000 0.3 0.001 1
23 30,000 0.8 0.001 1
24 300,000 0.8 0.001 1

\(~\) \(~\) \(~\)

Running the code provided in nsig_prop_bias_100sim.R, we obtain the following results:

Scenario n_samples h2 prop_effect S n_sig 5e-8 prop_bias 5e-8 mse 5e-8 n_sig 5e-4 prop_bias 5e-4 mse 5e-4 sd(n_sig) 5e-8 sd(prop_bias) 5e-8 sd(mse) 5e-8 sd(n_sig) 5e-4 sd(prop_bias) 5e-4 sd(mse) 5e-4
1 30,000 0.3 0.010 -1 0.70 1.0000 0.001573 612.78 0.9997 0.001900 0.745 0.0000 0.001234 24.841 0.0007 0.000108
2 300,000 0.3 0.010 -1 848.63 0.7619 0.000022 3200.03 0.7473 0.000049 18.145 0.0142 0.000002 38.450 0.0075 0.000002
3 30,000 0.8 0.010 -1 31.85 0.9804 0.000598 1083.65 0.9591 0.001186 5.208 0.0280 0.000198 32.603 0.0054 0.000063
4 300,000 0.8 0.010 -1 2760.68 0.6284 0.000017 5362.80 0.6391 0.000035 30.091 0.0084 0.000001 52.753 0.0066 0.000001
5 30,000 0.3 0.001 -1 86.90 0.7591 0.000214 774.49 0.8946 0.001480 6.317 0.0437 0.000054 22.626 0.0097 0.000078
6 300,000 0.3 0.001 -1 568.70 0.5509 0.000016 1215.01 0.7302 0.000099 14.074 0.0225 0.000002 29.951 0.0139 0.000006
7 30,000 0.8 0.001 -1 276.26 0.6273 0.000168 987.38 0.8037 0.001183 10.413 0.0271 0.000027 25.221 0.0124 0.000066
8 300,000 0.8 0.001 -1 727.10 0.5257 0.000016 1324.22 0.7036 0.000092 12.491 0.0171 0.000001 25.793 0.0120 0.000006
9 30,000 0.3 0.010 0 1.45 1.0000 0.001661 622.50 0.9980 0.001822 1.298 0.0000 0.005559 25.639 0.0018 0.000089
10 300,000 0.3 0.010 0 882.45 0.7297 0.000012 3059.66 0.7442 0.000046 18.435 0.0152 0.000001 41.677 0.0064 0.000003
11 30,000 0.8 0.010 0 48.06 0.9509 0.000245 1115.40 0.9402 0.001088 6.350 0.0301 0.000050 30.487 0.0070 0.000065
12 300,000 0.8 0.010 0 2586.78 0.6204 0.000011 5034.98 0.6449 0.000032 32.771 0.0096 0.000000 50.379 0.0062 0.000002
13 30,000 0.3 0.001 0 88.32 0.7254 0.000116 752.41 0.8953 0.001489 6.377 0.0480 0.000025 23.212 0.0104 0.000070
14 300,000 0.3 0.001 0 531.27 0.5560 0.000012 1182.74 0.7392 0.000100 12.431 0.0215 0.000001 26.240 0.0107 0.000006
15 30,000 0.8 0.001 0 257.44 0.6234 0.000106 946.54 0.8109 0.001207 8.936 0.0276 0.000013 23.704 0.0121 0.000058
16 300,000 0.8 0.001 0 691.46 0.5268 0.000013 1297.90 0.7097 0.000093 13.526 0.0185 0.000001 24.238 0.0118 0.000005
17 30,000 0.3 0.010 1 2.55 0.9941 0.000610 639.55 0.9959 0.001760 1.623 0.0405 0.000674 26.371 0.0024 0.000087
18 300,000 0.3 0.010 1 919.28 0.7031 0.000010 2905.76 0.7332 0.000046 15.799 0.0142 0.000001 37.008 0.0070 0.000003
19 30,000 0.8 0.010 1 68.13 0.9178 0.000198 1148.37 0.9199 0.001029 7.795 0.0370 0.000087 30.266 0.0080 0.000056
20 300,000 0.8 0.010 1 2433.27 0.6105 0.000009 4634.00 0.6436 0.000033 30.705 0.0107 0.000000 45.304 0.0058 0.000001
21 30,000 0.3 0.001 1 93.15 0.7056 0.000096 741.13 0.8962 0.001513 5.960 0.0428 0.000015 27.299 0.0101 0.000095
22 300,000 0.3 0.001 1 482.94 0.5516 0.000009 1122.87 0.7510 0.000104 12.742 0.0244 0.000001 26.147 0.0100 0.000006
23 30,000 0.8 0.001 1 244.29 0.6120 0.000089 916.49 0.8199 0.001243 9.237 0.0272 0.000010 26.753 0.0122 0.000074
24 300,000 0.8 0.001 1 632.74 0.5336 0.000010 1241.42 0.7219 0.000095 14.730 0.0194 0.000001 26.609 0.0125 0.000005

\(~\) \(~\) \(~\)

It is important to note here that for scenarios 1, 9 and 17, very few significant SNPs are detected on average. In some instances, we may even find that no SNPs are deemed significant at a threshold of \(5 \times 10^{-8}\). We must keep this observation in mind going forward as we investigate the performance of methods under these three scenarios.

For both thresholds, the average number of significant SNPs increases as sample size increases, as expected. It also increases with heritability. However, the effect of changing prop_effect is more interesting. Decreasing the proportion of effect SNPs from 0.01 to 0.001 results in the number of significant SNPs increasing for a sample size of 30,000 while we witness the number of SNPs passing the genome-wide significance threshold decreasing for a larger sample size of 300,000.

Furthermore, increasing sample size and increasing heritability from 0.3 to 0.8 all tend to decrease the fraction of significant SNPs whose estimates are more extreme than their true effect size. Decreasing polygenicity from 0.01 to 0.001 also has this same effect at a significance threshold of \(5 \times 10^{-8}\).

In order to gain a better insight into the information detailed in the above table, we simulate a single set of GWAS summary statistics and plot \(z\) vs \(\text{bias}\) in which \(\text{bias} = \hat\beta - \beta\) for each of the 24 different scenarios. On all figures, the bright red line corresponds to the significance threshold of \(5 \times 10^{-8}\) while the darker red line relates to \(5 \times 10^{-4}\).

2. Evaluating methods using a significance threshold of \(5 \times 10^{-8}\)

Using the code detailed in norm_5e-8_10sim.R and a total of 10 simulations, we evaluated six different Winner’s Curse methods across each of the 24 scenarios using the following three bias evaluation metrics:

  1. The average fraction of significant SNPs in which their association estimates are less biased due to method implementation - flb
  2. The average change in average MSE of significant SNPs due to method implementation - mse
  3. The average relative change in average MSE of significant SNPs due to method implementation - rel_mse

Note: All averages are obtained over only those simulations in which at least one significant SNP was detected.

Firstly, the fraction of \(n\) significant SNPs in which their association estimates are less biased due to method implementation may be mathematically described as: \[\frac{1}{n} \; \sum_{i=1}^{n}\mathbb{I} \left\{ \left| \hat\beta_i - \beta_i \right| > \left|\hat\beta_{\text{adj,}i} - \beta_i\right| \right\},\]in which \(\left| \frac{\hat\beta_i}{\hat\sigma_i} \right| > Z_{\frac{\alpha}{2}}\) for all \(i = 1,...,n\), where \(\hat\beta_i\) is the estimated naive effect size of SNP \(i\), \(\beta_i\) is its true effect size and \(\hat\beta_{\text{adj,}i}\) is its new effect size estimate obtained as a result of application of the Winner’s Curse adjustment method of interest. The significance threshold is represented by \(\alpha\).

Using the same notation, the average MSE over \(n\) significant SNPs is defined as: \[\frac{1}{n} \sum^n_{i=1} (\hat\beta_i - \beta_i)^2.\] Thus, using the above, we may formally define the change in average MSE of significant SNPs as: \[\frac{1}{n} \sum^n_{i=1} (\hat\beta_{\text{adj,}i} - \beta_i)^2 - \frac{1}{n} \sum^n_{i=1} (\hat\beta_i - \beta_i)^2\] and the relative change in average MSE of significant SNPs as: \[\frac{\frac{1}{n} \sum^n_{i=1} (\hat\beta_{\text{adj,}i} - \beta_i)^2 - \frac{1}{n} \sum^n_{i=1} (\hat\beta_i - \beta_i)^2}{\frac{1}{n} \sum^n_{i=1} (\hat\beta_i - \beta_i)^2}.\]

\(~\)

Results of the simulations are plotted. Error bars are also included in the plots. These figures allow us to see more clearly the scenarios in which it would be beneficial to apply a Winner’s Curse correction method and also, provide us with a better indication of which method we should use.

Summary of results for flb contained in norm_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 1.0000 1.0000 1.0000 0.9444 1.0000 1.0000
2 300,000 0.3 0.010 -1 0.6465 0.4422 0.6078 0.5290 0.5049 0.5138
3 30,000 0.8 0.010 -1 0.8838 0.7446 0.7949 0.6342 0.7626 0.6896
4 300,000 0.8 0.010 -1 0.5681 0.2803 0.5299 0.5149 0.4874 0.4986
5 30,000 0.3 0.001 -1 0.6840 0.3621 0.5168 0.5419 0.5012 0.4933
6 300,000 0.3 0.001 -1 0.4882 0.1447 0.2824 0.5031 0.4741 0.4943
7 30,000 0.8 0.001 -1 0.5195 0.2095 0.3149 0.5085 0.4758 0.4878
8 300,000 0.8 0.001 -1 0.4703 0.1016 0.2803 0.5137 0.4648 0.4934
9 30,000 0.3 0.010 0 0.8889 1.0000 0.9238 0.9333 0.8889 0.9167
10 300,000 0.3 0.010 0 0.6178 0.4004 0.5783 0.5233 0.4975 0.5189
11 30,000 0.8 0.010 0 0.8420 0.6658 0.7481 0.6007 0.6478 0.5947
12 300,000 0.8 0.010 0 0.5568 0.2688 0.5268 0.5125 0.4857 0.4969
13 30,000 0.3 0.001 0 0.6267 0.3232 0.4546 0.5343 0.4828 0.4807
14 300,000 0.3 0.001 0 0.4802 0.1446 0.2737 0.5164 0.4768 0.5002
15 30,000 0.8 0.001 0 0.5025 0.2091 0.2759 0.5143 0.4810 0.5001
16 300,000 0.8 0.001 0 0.4821 0.1072 0.2917 0.4994 0.4693 0.4940
17 30,000 0.3 0.010 1 0.8021 0.8883 0.8020 0.7852 0.8611 0.7370
18 300,000 0.3 0.010 1 0.6089 0.3625 0.5485 0.5336 0.4955 0.5182
19 30,000 0.8 0.010 1 0.8150 0.5901 0.6627 0.5702 0.6108 0.5913
20 300,000 0.8 0.010 1 0.5501 0.2605 0.5135 0.5152 0.4860 0.5015
21 30,000 0.3 0.001 1 0.5419 0.2678 0.4008 0.5413 0.4756 0.5264
22 300,000 0.3 0.001 1 0.4867 0.1301 0.2748 0.5046 0.4663 0.4881
23 30,000 0.8 0.001 1 0.4988 0.1816 0.2746 0.4995 0.4670 0.5049
24 300,000 0.8 0.001 1 0.4588 0.1123 0.2948 0.4999 0.4688 0.4755

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results for mse contained in norm_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.001001 -0.003130 -0.001256 -0.000626 -0.001466 -0.001470
2 300,000 0.3 0.010 -1 -0.000007 0.000000 -0.000005 0.000055 0.000022 0.000034
3 30,000 0.8 0.010 -1 -0.000433 -0.000434 -0.000483 0.000273 -0.000324 0.000008
4 300,000 0.8 0.010 -1 -0.000002 0.000005 0.000000 0.000033 0.000021 0.000026
5 30,000 0.3 0.001 -1 -0.000071 0.000180 0.000023 0.000526 0.000261 0.000438
6 300,000 0.3 0.001 -1 0.000000 0.000006 0.000013 0.000017 0.000009 0.000012
7 30,000 0.8 0.001 -1 -0.000005 0.000118 0.000164 0.000375 0.000202 0.000265
8 300,000 0.8 0.001 -1 0.000001 0.000004 0.000007 0.000008 0.000022 0.000012
9 30,000 0.3 0.010 0 -0.000837 -0.000438 -0.000583 -0.000180 -0.000545 -0.000580
10 300,000 0.3 0.010 0 -0.000003 0.000001 -0.000002 0.000027 0.000011 0.000018
11 30,000 0.8 0.010 0 -0.000148 -0.000145 -0.000167 0.000156 -0.000057 0.000055
12 300,000 0.8 0.010 0 -0.000001 0.000002 -0.000001 0.000021 0.000011 0.000015
13 30,000 0.3 0.001 0 -0.000026 0.000061 0.000024 0.000243 0.000129 0.000185
14 300,000 0.3 0.001 0 0.000000 0.000005 0.000009 0.000014 0.000009 0.000009
15 30,000 0.8 0.001 0 -0.000001 0.000061 0.000096 0.000193 0.000121 0.000165
16 300,000 0.8 0.001 0 0.000001 0.000004 0.000006 0.000011 0.000022 0.000014
17 30,000 0.3 0.010 1 -0.000305 -0.000368 -0.000368 -0.000215 -0.000548 -0.000187
18 300,000 0.3 0.010 1 -0.000002 0.000002 -0.000001 0.000023 0.000010 0.000015
19 30,000 0.8 0.010 1 -0.000116 -0.000084 -0.000100 0.000223 -0.000016 0.000081
20 300,000 0.8 0.010 1 -0.000001 0.000002 0.000000 0.000016 0.000009 0.000012
21 30,000 0.3 0.001 1 -0.000007 0.000079 0.000053 0.000229 0.000118 0.000131
22 300,000 0.3 0.001 1 0.000000 0.000004 0.000008 0.000012 0.000007 0.000008
23 30,000 0.8 0.001 1 0.000003 0.000063 0.000102 0.000173 0.000097 0.000117
24 300,000 0.8 0.001 1 0.000001 0.000002 0.000005 0.000009 0.000037 0.000018

\(~\) \(~\) \(~\) \(~\)

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results for rel_mse contained in norm_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 0.0507 -0.9492 -0.9003 -0.6770 -0.8856 -0.8814
2 300,000 0.3 0.010 -1 -0.3318 0.0221 -0.2262 2.6268 1.0571 1.6423
3 30,000 0.8 0.010 -1 -0.7151 -0.6603 -0.7701 0.5325 -0.5115 0.0245
4 300,000 0.8 0.010 -1 -0.0962 0.3036 -0.0255 1.9721 1.2329 1.5102
5 30,000 0.3 0.001 -1 -0.2889 0.9262 0.1466 2.3456 1.1771 2.3775
6 300,000 0.3 0.001 -1 0.0258 0.4226 0.8421 1.0807 0.5761 0.7620
7 30,000 0.8 0.001 -1 -0.0257 0.7143 1.0937 2.1556 1.3581 1.4677
8 300,000 0.8 0.001 -1 0.0645 0.2524 0.4507 0.5107 1.4334 0.7650
9 30,000 0.3 0.010 0 -0.8664 -0.8423 -0.7283 -0.1695 -0.8529 -0.6569
10 300,000 0.3 0.010 0 -0.2659 0.0530 -0.1682 2.3019 0.9933 1.5287
11 30,000 0.8 0.010 0 -0.6536 -0.5714 -0.6742 0.7167 -0.2226 0.2667
12 300,000 0.8 0.010 0 -0.0971 0.2097 -0.0475 1.9913 1.0455 1.3835
13 30,000 0.3 0.001 0 -0.2386 0.5800 0.2616 2.2566 1.1942 1.8102
14 300,000 0.3 0.001 0 0.0410 0.4408 0.8377 1.1782 0.8077 0.8149
15 30,000 0.8 0.001 0 -0.0116 0.6069 0.9116 1.8770 1.1874 1.5703
16 300,000 0.8 0.001 0 0.0698 0.3047 0.4743 0.8816 1.7814 1.1572
17 30,000 0.3 0.010 1 -0.2448 -0.8222 -0.3110 0.1454 -0.7682 -0.2974
18 300,000 0.3 0.010 1 -0.2176 0.1789 -0.0560 2.3831 1.0067 1.5483
19 30,000 0.8 0.010 1 -0.6034 -0.4314 -0.5516 1.1932 -0.0643 0.4635
20 300,000 0.8 0.010 1 -0.0754 0.2280 0.0061 1.8173 1.0108 1.3115
21 30,000 0.3 0.001 1 -0.0737 0.8246 0.5939 2.3955 1.2562 1.2410
22 300,000 0.3 0.001 1 0.0448 0.4270 0.8541 1.2463 0.7071 0.8717
23 30,000 0.8 0.001 1 0.0351 0.7122 1.1913 1.8746 1.1730 1.3601
24 300,000 0.8 0.001 1 0.1411 0.2342 0.5176 0.8523 3.6076 1.7753

\(~\) \(~\) \(~\) \(~\)

Relative change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

3. Evaluating methods using a significance threshold of \(5 \times 10^{-4}\)

Similar to part 2 above, we use the code detailed in norm_5e-4_10sim.R with a total of 10 simulations in order to evaluate six different Winner’s Curse methods across each of the 24 scenarios. The same three bias evaluation metrics are considered.

Summary of results for flb contained in norm_5e-4_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 0.9454 0.9449 0.9724 0.9695 0.9908 0.9810
2 300,000 0.3 0.010 -1 0.5998 0.4608 0.5225 0.6079 0.6104 0.6041
3 30,000 0.8 0.010 -1 0.7565 0.7121 0.7531 0.7961 0.8548 0.8243
4 300,000 0.8 0.010 -1 0.5487 0.3505 0.4950 0.5632 0.5541 0.5527
5 30,000 0.3 0.001 -1 0.7917 0.7348 0.7656 0.8377 0.8404 0.8363
6 300,000 0.3 0.001 -1 0.6661 0.4972 0.5711 0.7068 0.6953 0.7118
7 30,000 0.8 0.001 -1 0.7078 0.6059 0.6280 0.7589 0.7546 0.7597
8 300,000 0.8 0.001 -1 0.6461 0.4438 0.5529 0.6983 0.6785 0.6876
9 30,000 0.3 0.010 0 0.9257 0.9170 0.9541 0.9556 0.9836 0.9656
10 300,000 0.3 0.010 0 0.5921 0.4611 0.5249 0.6127 0.6076 0.6077
11 30,000 0.8 0.010 0 0.7472 0.6949 0.7264 0.7870 0.8369 0.8084
12 300,000 0.8 0.010 0 0.5487 0.3547 0.5040 0.5672 0.5561 0.5614
13 30,000 0.3 0.001 0 0.7988 0.7340 0.7672 0.8391 0.8475 0.8430
14 300,000 0.3 0.001 0 0.6666 0.5070 0.5683 0.7161 0.6998 0.7102
15 30,000 0.8 0.001 0 0.7119 0.6147 0.6375 0.7682 0.7640 0.7697
16 300,000 0.8 0.001 0 0.6521 0.4562 0.5575 0.6935 0.6720 0.6790
17 30,000 0.3 0.010 1 0.8997 0.8978 0.9285 0.9448 0.9684 0.9548
18 300,000 0.3 0.010 1 0.5944 0.4521 0.5157 0.6140 0.6135 0.6090
19 30,000 0.8 0.010 1 0.7288 0.6709 0.7072 0.7641 0.8077 0.7797
20 300,000 0.8 0.010 1 0.5524 0.3568 0.5034 0.5718 0.5594 0.5648
21 30,000 0.3 0.001 1 0.7987 0.7440 0.7681 0.8492 0.8467 0.8481
22 300,000 0.3 0.001 1 0.6791 0.5203 0.5893 0.7270 0.7234 0.7250
23 30,000 0.8 0.001 1 0.7313 0.6308 0.6617 0.7742 0.7757 0.7785
24 300,000 0.8 0.001 1 0.6465 0.4756 0.5691 0.7071 0.6843 0.7013

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-4}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results for mse contained in norm_5e-4_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.001804 -0.001801 -0.001738 -0.001627 -0.001359 -0.001496
2 300,000 0.3 0.010 -1 -0.000020 -0.000011 -0.000015 -0.000012 -0.000018 -0.000016
3 30,000 0.8 0.010 -1 -0.000897 -0.000889 -0.000896 -0.000799 -0.000779 -0.000826
4 300,000 0.8 0.010 -1 -0.000010 -0.000001 -0.000006 -0.000004 -0.000007 -0.000007
5 30,000 0.3 0.001 -1 -0.001187 -0.001180 -0.001078 -0.001162 -0.000976 -0.001079
6 300,000 0.3 0.001 -1 -0.000077 -0.000069 -0.000058 -0.000071 -0.000062 -0.000068
7 30,000 0.8 0.001 -1 -0.000895 -0.000824 -0.000727 -0.000876 -0.000762 -0.000846
8 300,000 0.8 0.001 -1 -0.000066 -0.000065 -0.000054 -0.000068 -0.000047 -0.000060
9 30,000 0.3 0.010 0 -0.001741 -0.001753 -0.001602 -0.001492 -0.001276 -0.001402
10 300,000 0.3 0.010 0 -0.000023 -0.000022 -0.000021 -0.000022 -0.000023 -0.000023
11 30,000 0.8 0.010 0 -0.000849 -0.000875 -0.000815 -0.000802 -0.000736 -0.000834
12 300,000 0.8 0.010 0 -0.000013 -0.000008 -0.000010 -0.000009 -0.000012 -0.000011
13 30,000 0.3 0.001 0 -0.001291 -0.001264 -0.001195 -0.001171 -0.001000 -0.001160
14 300,000 0.3 0.001 0 -0.000078 -0.000070 -0.000057 -0.000072 -0.000065 -0.000072
15 30,000 0.8 0.001 0 -0.000979 -0.000940 -0.000851 -0.000931 -0.000786 -0.000888
16 300,000 0.8 0.001 0 -0.000065 -0.000065 -0.000054 -0.000067 -0.000043 -0.000059
17 30,000 0.3 0.010 1 -0.001656 -0.001708 -0.001548 -0.001517 -0.001213 -0.001378
18 300,000 0.3 0.010 1 -0.000024 -0.000023 -0.000022 -0.000024 -0.000024 -0.000026
19 30,000 0.8 0.010 1 -0.000828 -0.000830 -0.000795 -0.000802 -0.000677 -0.000753
20 300,000 0.8 0.010 1 -0.000014 -0.000012 -0.000012 -0.000013 -0.000014 -0.000015
21 30,000 0.3 0.001 1 -0.001351 -0.001330 -0.001184 -0.001192 -0.000998 -0.001148
22 300,000 0.3 0.001 1 -0.000083 -0.000076 -0.000064 -0.000081 -0.000067 -0.000076
23 30,000 0.8 0.001 1 -0.001087 -0.001002 -0.000893 -0.001002 -0.000821 -0.000902
24 300,000 0.8 0.001 1 -0.000072 -0.000071 -0.000057 -0.000073 -0.000041 -0.000065

\(~\) \(~\) \(~\) \(~\)

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-4}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results for rel_mse contained in norm_5e-4_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.9495 -0.9531 -0.8988 -0.8988 -0.8988 -0.8988
2 300,000 0.3 0.010 -1 -0.4039 -0.2233 -0.3032 -0.3032 -0.3032 -0.3032
3 30,000 0.8 0.010 -1 -0.7519 -0.7415 -0.7406 -0.7406 -0.7406 -0.7406
4 300,000 0.8 0.010 -1 -0.2785 -0.0289 -0.1681 -0.1681 -0.1681 -0.1681
5 30,000 0.3 0.001 -1 -0.7974 -0.7718 -0.7368 -0.7368 -0.7368 -0.7368
6 300,000 0.3 0.001 -1 -0.7650 -0.6881 -0.5645 -0.5645 -0.5645 -0.5645
7 30,000 0.8 0.001 -1 -0.7600 -0.7027 -0.6101 -0.6101 -0.6101 -0.6101
8 300,000 0.8 0.001 -1 -0.7266 -0.6984 -0.5693 -0.5693 -0.5693 -0.5693
9 30,000 0.3 0.010 0 -0.9581 -0.9610 -0.8900 -0.8900 -0.8900 -0.8900
10 300,000 0.3 0.010 0 -0.5074 -0.4572 -0.4590 -0.4590 -0.4590 -0.4590
11 30,000 0.8 0.010 0 -0.7960 -0.8027 -0.7645 -0.7645 -0.7645 -0.7645
12 300,000 0.8 0.010 0 -0.3798 -0.2440 -0.3148 -0.3148 -0.3148 -0.3148
13 30,000 0.3 0.001 0 -0.8686 -0.8498 -0.7890 -0.7890 -0.7890 -0.7890
14 300,000 0.3 0.001 0 -0.7685 -0.7035 -0.5816 -0.5816 -0.5816 -0.5816
15 30,000 0.8 0.001 0 -0.8155 -0.7747 -0.6954 -0.6954 -0.6954 -0.6954
16 300,000 0.8 0.001 0 -0.7139 -0.6916 -0.5784 -0.5784 -0.5784 -0.5784
17 30,000 0.3 0.010 1 -0.9419 -0.9543 -0.8820 -0.8820 -0.8820 -0.8820
18 300,000 0.3 0.010 1 -0.5347 -0.4905 -0.4819 -0.4819 -0.4819 -0.4819
19 30,000 0.8 0.010 1 -0.7932 -0.7945 -0.7530 -0.7530 -0.7530 -0.7530
20 300,000 0.8 0.010 1 -0.4397 -0.3487 -0.3736 -0.3736 -0.3736 -0.3736
21 30,000 0.3 0.001 1 -0.8957 -0.8726 -0.7953 -0.7953 -0.7953 -0.7953
22 300,000 0.3 0.001 1 -0.8130 -0.7587 -0.6277 -0.6277 -0.6277 -0.6277
23 30,000 0.8 0.001 1 -0.8561 -0.8119 -0.7287 -0.7287 -0.7287 -0.7287
24 300,000 0.8 0.001 1 -0.7568 -0.7387 -0.6040 -0.6040 -0.6040 -0.6040

\(~\) \(~\) \(~\) \(~\)

Relative change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-4}\):

4. Skewed distribution of effect sizes

Here we investigate the 24 different scenarios under a skewed distribution of effect sizes. In order to create a bimodal distribution, we simulate 50% of effect sizes of the true effect SNPs from a normal distribution centered at 0 while the other half are generated from a normal distribution with mean 2.5. As above, we first have a look at the expected number of significant SNPs and the expected proportion of those in which their association estimate is exaggerated.

Running the code provided in nsig_prop_bias_100sim.R, we obtain the following results:

Scenario n_samples h2 prop_effect S n_sig 5e-8 prop_bias 5e-8 mse 5e-8 sd(n_sig) 5e-8 sd(prop_bias) 5e-8 sd(mse) 5e-8
1 30,000 0.3 0.010 -1 0.49 1.0000 0.001239 0.659 0.0000 0.000966
2 300,000 0.3 0.010 -1 857.47 0.7724 0.000016 21.728 0.0140 0.000002
3 30,000 0.8 0.010 -1 24.81 0.9966 0.000442 4.890 0.0126 0.000117
4 300,000 0.8 0.010 -1 2821.07 0.6250 0.000014 29.028 0.0085 0.000001
5 30,000 0.3 0.001 -1 86.72 0.7677 0.000156 6.909 0.0434 0.000037
6 300,000 0.3 0.001 -1 568.01 0.5494 0.000014 12.738 0.0178 0.000002
7 30,000 0.8 0.001 -1 281.94 0.6214 0.000138 9.623 0.0309 0.000024
8 300,000 0.8 0.001 -1 729.36 0.5221 0.000015 11.343 0.0179 0.000001
9 30,000 0.3 0.010 0 0.54 1.0000 0.001643 0.731 0.0000 0.003033
10 300,000 0.3 0.010 0 911.67 0.7746 0.000012 22.312 0.0129 0.000001
11 30,000 0.8 0.010 0 23.16 0.9987 0.000392 4.683 0.0079 0.000069
12 300,000 0.8 0.010 0 2859.69 0.6116 0.000010 29.929 0.0097 0.000000
13 30,000 0.3 0.001 0 90.54 0.7672 0.000113 6.641 0.0470 0.000020
14 300,000 0.3 0.001 0 541.99 0.5429 0.000012 12.748 0.0191 0.000001
15 30,000 0.8 0.001 0 285.65 0.6135 0.000104 9.134 0.0300 0.000014
16 300,000 0.8 0.001 0 690.16 0.5257 0.000013 11.850 0.0204 0.000001
17 30,000 0.3 0.010 1 0.47 1.0000 0.001354 0.745 0.0000 0.000857
18 300,000 0.3 0.010 1 917.61 0.8198 0.000012 23.791 0.0108 0.000001
19 30,000 0.8 0.010 1 15.15 1.0000 0.000484 2.949 0.0000 0.000087
20 300,000 0.8 0.010 1 3134.20 0.6022 0.000010 25.334 0.0093 0.000000
21 30,000 0.3 0.001 1 91.80 0.8184 0.000123 6.123 0.0373 0.000022
22 300,000 0.3 0.001 1 522.21 0.5321 0.000012 8.395 0.0192 0.000001
23 30,000 0.8 0.001 1 313.70 0.6046 0.000096 9.802 0.0235 0.000010
24 300,000 0.8 0.001 1 628.33 0.5248 0.000014 10.876 0.0186 0.000001

\(~\) \(~\) \(~\) \(~\)

Next, we repeat the process illustrated in Section 2 using the same bias evaluation metrics with a significance threshold of \(5 \times 10^{-8}\).

Summary of results for flb contained in skew_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
2 300,000 0.3 0.010 -1 0.6602 0.4526 0.6297 0.5353 0.5017 0.5087
3 30,000 0.8 0.010 -1 0.9320 0.8476 0.8899 0.6292 0.8174 0.7111
4 300,000 0.8 0.010 -1 0.5632 0.2800 0.5431 0.5125 0.4865 0.4974
5 30,000 0.3 0.001 -1 0.6102 0.3468 0.5279 0.5340 0.5144 0.5307
6 300,000 0.3 0.001 -1 0.4831 0.1459 0.2915 0.5022 0.4736 0.4830
7 30,000 0.8 0.001 -1 0.5371 0.2252 0.3257 0.5174 0.4789 0.4968
8 300,000 0.8 0.001 -1 0.4698 0.0984 0.2784 0.4953 0.4788 0.4828
9 30,000 0.3 0.010 0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
10 300,000 0.3 0.010 0 0.6457 0.4562 0.6293 0.5325 0.4980 0.5059
11 30,000 0.8 0.010 0 0.9630 0.8636 0.9338 0.6971 0.8777 0.6586
12 300,000 0.8 0.010 0 0.5554 0.2651 0.5347 0.5090 0.4864 0.5031
13 30,000 0.3 0.001 0 0.6808 0.3087 0.5335 0.4939 0.4977 0.5004
14 300,000 0.3 0.001 0 0.4798 0.1316 0.2796 0.4985 0.4685 0.5068
15 30,000 0.8 0.001 0 0.5019 0.2051 0.2981 0.5115 0.4946 0.4934
16 300,000 0.8 0.001 0 0.4842 0.1064 0.2774 0.4993 0.4725 0.5010
17 30,000 0.3 0.010 1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
18 300,000 0.3 0.010 1 0.6860 0.5203 0.6782 0.5451 0.4867 0.5060
19 30,000 0.8 0.010 1 0.9923 0.9563 0.9778 0.6972 0.8984 0.8678
20 300,000 0.8 0.010 1 0.5422 0.2553 0.5372 0.5129 0.4807 0.4942
21 30,000 0.3 0.001 1 0.6793 0.3596 0.6099 0.5534 0.4918 0.5101
22 300,000 0.3 0.001 1 0.4772 0.1125 0.2491 0.4955 0.4535 0.4924
23 30,000 0.8 0.001 1 0.5083 0.1797 0.2977 0.5154 0.4805 0.5028
24 300,000 0.8 0.001 1 0.4787 0.0890 0.2545 0.5117 0.4779 0.4917

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

\(~\) \(~\) \(~\) \(~\)

Summary of results for mse contained in skew_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.001274 -0.001701 -0.000495 -0.000563 -0.000640 -0.001265
2 300,000 0.3 0.010 -1 -0.000007 -0.000002 -0.000006 0.000038 0.000015 0.000024
3 30,000 0.8 0.010 -1 -0.000389 -0.000346 -0.000388 0.000060 -0.000262 -0.000136
4 300,000 0.8 0.010 -1 -0.000002 0.000003 -0.000001 0.000030 0.000016 0.000022
5 30,000 0.3 0.001 -1 -0.000035 0.000067 -0.000004 0.000434 0.000117 0.000272
6 300,000 0.3 0.001 -1 0.000001 0.000007 0.000012 0.000015 0.000011 0.000014
7 30,000 0.8 0.001 -1 -0.000010 0.000072 0.000088 0.000369 0.000185 0.000214
8 300,000 0.8 0.001 -1 0.000001 0.000004 0.000009 0.000010 0.000011 0.000007
9 30,000 0.3 0.010 0 -0.001155 -0.000693 -0.001221 -0.000859 -0.000830 -0.000999
10 300,000 0.3 0.010 0 -0.000004 -0.000001 -0.000004 0.000030 0.000012 0.000018
11 30,000 0.8 0.010 0 -0.000294 -0.000356 -0.000338 -0.000019 -0.000291 -0.000088
12 300,000 0.8 0.010 0 -0.000001 0.000003 -0.000001 0.000021 0.000012 0.000015
13 30,000 0.3 0.001 0 -0.000039 0.000063 -0.000014 0.000354 0.000102 0.000190
14 300,000 0.3 0.001 0 0.000000 0.000004 0.000006 0.000013 0.000008 0.000011
15 30,000 0.8 0.001 0 0.000001 0.000062 0.000053 0.000206 0.000107 0.000154
16 300,000 0.8 0.001 0 0.000001 0.000004 0.000007 0.000012 0.000009 0.000009
17 30,000 0.3 0.010 1 -0.001416 -0.000878 -0.001417 -0.001012 -0.000918 -0.001379
18 300,000 0.3 0.010 1 -0.000006 -0.000003 -0.000006 0.000029 0.000011 0.000019
19 30,000 0.8 0.010 1 -0.000390 -0.000419 -0.000449 -0.000168 -0.000368 -0.000272
20 300,000 0.8 0.010 1 -0.000001 0.000003 -0.000001 0.000021 0.000012 0.000016
21 30,000 0.3 0.001 1 -0.000046 0.000039 -0.000038 0.000272 0.000128 0.000198
22 300,000 0.3 0.001 1 0.000001 0.000004 0.000007 0.000014 0.000008 0.000010
23 30,000 0.8 0.001 1 0.000002 0.000075 0.000044 0.000203 0.000118 0.000139
24 300,000 0.8 0.001 1 0.000001 0.000004 0.000007 0.000009 0.000009 0.000006

\(~\) \(~\) \(~\) \(~\)

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

\(~\) \(~\) \(~\) \(~\)

Summary of results for rel_mse contained in skew_5e-8_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.8635 -0.9212 -0.8219 -0.4293 -0.8107 -0.8705
2 300,000 0.3 0.010 -1 -0.4172 -0.1148 -0.3705 2.3193 0.8922 1.5814
3 30,000 0.8 0.010 -1 -0.8060 -0.8335 -0.8544 0.1729 -0.6060 -0.2871
4 300,000 0.8 0.010 -1 -0.1246 0.2416 -0.0750 2.2128 1.1484 1.6053
5 30,000 0.3 0.001 -1 -0.1651 0.5245 0.0929 3.1574 0.7103 1.4561
6 300,000 0.3 0.001 -1 0.0382 0.4855 0.8315 1.0604 0.7592 1.0271
7 30,000 0.8 0.001 -1 -0.0712 0.5567 0.7306 2.4334 1.3251 1.5457
8 300,000 0.8 0.001 -1 0.0818 0.2648 0.6272 0.6465 0.7114 0.5339
9 30,000 0.3 0.010 0 -0.8800 -0.8934 -0.9246 -0.7145 -0.9552 -0.8406
10 300,000 0.3 0.010 0 -0.3909 -0.0443 -0.3744 2.4960 1.0246 1.5486
11 30,000 0.8 0.010 0 -0.8678 -0.8670 -0.8660 -0.0071 -0.7144 -0.2569
12 300,000 0.8 0.010 0 -0.1071 0.2484 -0.1010 2.1076 1.2397 1.5087
13 30,000 0.3 0.001 0 -0.3302 0.6349 -0.0978 3.0676 0.8870 1.6256
14 300,000 0.3 0.001 0 0.0121 0.3567 0.5443 1.1586 0.7176 0.8452
15 30,000 0.8 0.001 0 0.0068 0.5925 0.5504 2.0008 1.1088 1.5263
16 300,000 0.8 0.001 0 0.0372 0.2910 0.5814 0.8824 0.7017 0.6361
17 30,000 0.3 0.010 1 -0.9314 -0.9401 -0.9535 -0.7406 -0.9760 -0.7875
18 300,000 0.3 0.010 1 -0.5024 -0.2669 -0.4772 2.3833 0.9816 1.6669
19 30,000 0.8 0.010 1 -0.8261 -0.9230 -0.8909 -0.3114 -0.7848 -0.5560
20 300,000 0.8 0.010 1 -0.1058 0.3179 -0.1140 2.1520 1.2852 1.6884
21 30,000 0.3 0.001 1 -0.3767 0.3615 -0.2946 2.3399 1.1233 1.7798
22 300,000 0.3 0.001 1 0.0689 0.3582 0.5812 1.2500 0.6666 0.8880
23 30,000 0.8 0.001 1 0.0281 0.7811 0.4752 2.1502 1.2268 1.4006
24 300,000 0.8 0.001 1 0.0462 0.3228 0.5375 0.7042 0.6861 0.4428

\(~\) \(~\) \(~\) \(~\)

Relative change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

\(~\) \(~\) \(~\) \(~\)

======= Evaluating Winner’s Curse methods

In this document, we intend to investigate the following key questions, assuming a fixed array of \(10^6\) SNPs and a quantitative trait in which SNP effect sizes follow a normal distribution:

1. When is Winner’s Curse a problem?

In this section, we look at the average number of significant SNPs and the average proportion of these significant SNPs that have association estimates more extreme than their true effect size at two different thresholds; the common genome-wide significance threshold of \(5 \times 10^{-8}\) and a higher threshold of \(5 \times 10^{-4}\). We consider these properties under certain combinations of values for the following parameters:

  1. sample size - n_samples
  2. heritability - h2
  3. polygenicity, i.e. proportion of effect SNPs - prop_effect
  4. selection coefficient - S

The 24 different combinations that we will investigate throughout this document are detailed below:

Scenario n_samples h2 prop_effect S
1 30,000 0.3 0.010 -1
2 300,000 0.3 0.010 -1
3 30,000 0.8 0.010 -1
4 300,000 0.8 0.010 -1
5 30,000 0.3 0.001 -1
6 300,000 0.3 0.001 -1
7 30,000 0.8 0.001 -1
8 300,000 0.8 0.001 -1
9 30,000 0.3 0.010 0
10 300,000 0.3 0.010 0
11 30,000 0.8 0.010 0
12 300,000 0.8 0.010 0
Scenario n_samples h2 prop_effect S
13 30,000 0.3 0.001 0
14 300,000 0.3 0.001 0
15 30,000 0.8 0.001 0
16 300,000 0.8 0.001 0
17 30,000 0.3 0.010 1
18 300,000 0.3 0.010 1
19 30,000 0.8 0.010 1
20 300,000 0.8 0.010 1
21 30,000 0.3 0.001 1
22 300,000 0.3 0.001 1
23 30,000 0.8 0.001 1
24 300,000 0.8 0.001 1

\(~\) \(~\) \(~\)

Running the code provided in nsig_prop_bias_100sim.R, we obtain the following results:

Scenario n_samples h2 prop_effect S n_sig 5e-8 sd(n_sig) 5e-8 prop_bias 5e-8 sd(prop_bias) 5e-8 n_sig 5e-4 sd(n_sig) 5e-4 prop_bias 5e-4 sd(prop_bias) 5e-4
1 30,000 0.3 0.010 -1 0.70 0.745 0.5600 0.4989 610.55 24.819 0.9996 0.0008
2 300,000 0.3 0.010 -1 848.63 18.145 0.7606 0.0142 3202.84 39.990 0.7461 0.0070
3 30,000 0.8 0.010 -1 31.85 5.208 0.9796 0.0232 1084.25 34.857 0.9597 0.0053
4 300,000 0.8 0.010 -1 2760.68 30.091 0.6283 0.0085 5354.55 49.606 0.6393 0.0069
5 30,000 0.3 0.001 -1 86.90 6.317 0.7524 0.0466 767.58 24.228 0.8957 0.0090
6 300,000 0.3 0.001 -1 568.70 14.074 0.5520 0.0197 1215.12 28.204 0.7303 0.0128
7 30,000 0.8 0.001 -1 276.26 10.413 0.6209 0.0286 985.80 24.088 0.8043 0.0117
8 300,000 0.8 0.001 -1 727.10 12.491 0.5289 0.0175 1323.99 25.015 0.7068 0.0127
9 30,000 0.3 0.010 0 1.45 1.298 0.7400 0.4408 622.70 20.047 0.9985 0.0015
10 300,000 0.3 0.010 0 882.45 18.435 0.7256 0.0144 3053.36 41.345 0.7427 0.0070
11 30,000 0.8 0.010 0 48.06 6.350 0.9524 0.0305 1113.04 27.657 0.9407 0.0068
12 300,000 0.8 0.010 0 2586.78 32.771 0.6201 0.0080 5021.84 49.899 0.6455 0.0062
13 30,000 0.3 0.001 0 88.32 6.377 0.7278 0.0435 752.65 23.238 0.8953 0.0097
14 300,000 0.3 0.001 0 531.27 12.431 0.5553 0.0239 1177.90 26.600 0.7402 0.0121
15 30,000 0.8 0.001 0 257.44 8.936 0.6212 0.0265 950.39 26.269 0.8130 0.0100
16 300,000 0.8 0.001 0 691.46 13.526 0.5322 0.0194 1299.19 25.483 0.7114 0.0110
17 30,000 0.3 0.010 1 2.55 1.623 0.9400 0.2387 639.02 23.892 0.9966 0.0022
18 300,000 0.3 0.010 1 919.28 15.799 0.7038 0.0137 2910.10 40.332 0.7313 0.0086
19 30,000 0.8 0.010 1 68.13 7.795 0.9215 0.0277 1148.44 26.520 0.9208 0.0077
20 300,000 0.8 0.010 1 2433.27 30.705 0.6097 0.0104 4630.19 49.334 0.6464 0.0078
21 30,000 0.3 0.001 1 93.15 5.960 0.7145 0.0422 739.81 26.383 0.8944 0.0102
22 300,000 0.3 0.001 1 482.94 12.742 0.5519 0.0194 1120.17 26.197 0.7510 0.0116
23 30,000 0.8 0.001 1 244.29 9.237 0.6119 0.0315 909.17 24.092 0.8199 0.0110
24 300,000 0.8 0.001 1 632.74 14.730 0.5330 0.0193 1237.86 25.797 0.7201 0.0121

For both thresholds, we note that the average number of significant SNPs increases as sample size increases, as expected. It also increases with heritability. However, the effect of changing prop_effect is more interesting. Decreasing the proportion of effect SNPs from 0.01 to 0.001 results in the number of significant SNPs increasing for a sample size of 30,000 while we witness the number of SNPs passing the genome-wide significance threshold decreasing for a larger sample size of 300,000.

Furthermore, increasing sample size and increasing heritability from 0.3 to 0.8 all tend to decrease the fraction of significant SNPs whose estimates are more extreme than their true effect size. Decreasing polygenicity from 0.01 to 0.001 also has this same effect at a significance threshold of \(5 \times 10^{-8}\).

In order to gain a better insight into the information detailed in the above table, we simulate a single set of GWAS summary statistics and plot \(z\) vs \(\text{bias}\) in which \(\text{bias} = \hat\beta - \beta\) for each of the 24 different scenarios. On all figures, the bright red line corresponds to the significance threshold of \(5 \times 10^{-8}\) while the darker red line relates to \(5 \times 10^{-4}\).

2. Evaluating methods using a significance threshold of \(5 \times 10^{-8}\)

Using the code detailed in norm_5e-8_10sim.R and a total of 10 simulations, we evaluated six different Winner’s Curse methods across each of the 24 scenarios using the following three bias evaluation metrics:

  1. The average fraction of significant SNPs in which their association estimates are less biased due to method implementation
  2. The average change in average MSE of significant SNPs due to method implementation
  3. The average relative change in average MSE of significant SNPs due to method implementation

Note: All averages were obtained over only those simulations in which at least one significant SNP was detected.

The average MSE over \(n\) significant SNPs is defined as: \[\frac{1}{n} \sum^n_{i=1} (\hat\beta_i - \beta_i)^2\]in which \(\mid \frac{\hat\beta_i}{\hat\sigma_i} \mid > c\) for all \(i = 1,...,n\), where \(\hat\beta_i\) is the estimated effect size of SNP \(i\) and \(\beta_i\) is its true effect size.

Results of the simulations are plotted with error bars included. These figures allow us to see more clearly the scenarios in which it would be beneficial to apply a Winner’s Curse correction method and also, provide us with a better indication of which method we should use.

Summary of results contained in norm_5e-8_flb_10sim.csv:
Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 1.0000 1.0000 1.0000 0.9444 1.0000 0.9375
2 300,000 0.3 0.010 -1 0.6465 0.4422 0.6078 0.5290 0.4960 0.5038
3 30,000 0.8 0.010 -1 0.8838 0.7446 0.7949 0.6342 0.7587 0.6466
4 300,000 0.8 0.010 -1 0.5681 0.2803 0.5299 0.5149 0.4820 0.4972
5 30,000 0.3 0.001 -1 0.6840 0.3621 0.5192 0.5419 0.5184 0.4950
6 300,000 0.3 0.001 -1 0.4882 0.1447 0.3583 0.5031 0.4694 0.5014
7 30,000 0.8 0.001 -1 0.5195 0.2095 0.3674 0.5085 0.4884 0.4802
8 300,000 0.8 0.001 -1 0.4703 0.1016 0.3837 0.5137 0.4776 0.4895
9 30,000 0.3 0.010 0 0.8889 1.0000 0.9238 0.9333 0.9667 1.0000
10 300,000 0.3 0.010 0 0.6178 0.4004 0.5795 0.5233 0.4918 0.5167
11 30,000 0.8 0.010 0 0.8420 0.6658 0.7481 0.6007 0.6602 0.6383
12 300,000 0.8 0.010 0 0.5568 0.2688 0.5268 0.5125 0.4900 0.4962
13 30,000 0.3 0.001 0 0.6267 0.3232 0.4617 0.5343 0.5243 0.5299
14 300,000 0.3 0.001 0 0.4802 0.1446 0.3471 0.5164 0.4747 0.4871
15 30,000 0.8 0.001 0 0.5025 0.2091 0.3321 0.5143 0.4823 0.4997
16 300,000 0.8 0.001 0 0.4821 0.1072 0.4004 0.4994 0.4633 0.4795
17 30,000 0.3 0.010 1 0.8021 0.8883 0.8020 0.7852 0.9000 0.8520
18 300,000 0.3 0.010 1 0.6089 0.3625 0.5512 0.5336 0.4908 0.5006
19 30,000 0.8 0.010 1 0.8150 0.5901 0.6627 0.5702 0.6126 0.5875
20 300,000 0.8 0.010 1 0.5501 0.2605 0.5147 0.5152 0.4774 0.4970
21 30,000 0.3 0.001 1 0.5419 0.2678 0.4119 0.5413 0.4798 0.5144
22 300,000 0.3 0.001 1 0.4867 0.1301 0.3502 0.5046 0.4911 0.4994
23 30,000 0.8 0.001 1 0.4988 0.1816 0.3266 0.4995 0.4768 0.5032
24 300,000 0.8 0.001 1 0.4588 0.1123 0.3954 0.4999 0.4603 0.4911

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results contained in norm_5e-8_mse-change2_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.002070 -0.001182 -0.000615 -0.000895 -0.001887 -0.001006
2 300,000 0.3 0.010 -1 -0.000007 0.000002 -0.000004 0.000053 0.000022 0.000036
3 30,000 0.8 0.010 -1 -0.000425 -0.000376 -0.000425 0.000271 -0.000300 -0.000058
4 300,000 0.8 0.010 -1 -0.000002 0.000005 0.000000 0.000034 0.000020 0.000027
5 30,000 0.3 0.001 -1 -0.000058 0.000118 0.000058 0.000450 0.000223 0.000433
6 300,000 0.3 0.001 -1 0.000000 0.000006 0.000013 0.000013 0.000010 0.000013
7 30,000 0.8 0.001 -1 0.000014 0.000106 0.000171 0.000335 0.000226 0.000267
8 300,000 0.8 0.001 -1 0.000001 0.000003 0.000007 0.000009 0.000025 0.000015
9 30,000 0.3 0.010 0 -0.000450 -0.002613 -0.000263 -0.000197 -0.000661 -0.001975
10 300,000 0.3 0.010 0 -0.000003 0.000000 -0.000002 0.000026 0.000011 0.000019
11 30,000 0.8 0.010 0 -0.000169 -0.000117 -0.000156 0.000211 -0.000043 0.000016
12 300,000 0.8 0.010 0 -0.000001 0.000002 0.000000 0.000021 0.000011 0.000015
13 30,000 0.3 0.001 0 -0.000024 0.000067 0.000039 0.000265 0.000110 0.000167
14 300,000 0.3 0.001 0 0.000001 0.000004 0.000010 0.000014 0.000009 0.000010
15 30,000 0.8 0.001 0 -0.000001 0.000067 0.000131 0.000223 0.000124 0.000145
16 300,000 0.8 0.001 0 0.000002 0.000004 0.000007 0.000010 0.000024 0.000013
17 30,000 0.3 0.010 1 -0.000415 -0.000407 -0.000476 -0.000226 -0.000393 -0.000387
18 300,000 0.3 0.010 1 -0.000002 0.000002 0.000000 0.000022 0.000012 0.000016
19 30,000 0.8 0.010 1 -0.000106 -0.000105 -0.000103 0.000251 0.000017 0.000080
20 300,000 0.8 0.010 1 -0.000001 0.000002 0.000000 0.000017 0.000009 0.000012
21 30,000 0.3 0.001 1 -0.000009 0.000071 0.000042 0.000194 0.000102 0.000168
22 300,000 0.3 0.001 1 0.000001 0.000004 0.000008 0.000011 0.000007 0.000009
23 30,000 0.8 0.001 1 -0.000001 0.000053 0.000124 0.000164 0.000079 0.000119
24 300,000 0.8 0.001 1 0.000002 0.000003 0.000005 0.000007 0.000043 0.000019

\(~\) \(~\) \(~\) \(~\)

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results contained in norm_5e-8_mse-change_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.2790 -0.3746 -0.4888 -0.0375 -0.6638 -0.4246
2 300,000 0.3 0.010 -1 -0.3203 0.0175 -0.2250 2.6836 1.0236 1.5283
3 30,000 0.8 0.010 -1 -0.6762 -0.8032 -0.7688 0.4359 -0.4902 -0.1540
4 300,000 0.8 0.010 -1 -0.0887 0.2997 0.0047 1.9750 1.1278 1.4642
5 30,000 0.3 0.001 -1 -0.2628 0.6090 0.2191 3.6710 0.9562 1.2863
6 300,000 0.3 0.001 -1 0.0461 0.4218 0.9426 0.9694 0.6575 0.7537
7 30,000 0.8 0.001 -1 0.0277 0.8386 0.9865 1.9543 1.2192 1.6268
8 300,000 0.8 0.001 -1 0.0703 0.2036 0.4137 0.6165 1.4676 0.6500
9 30,000 0.3 0.010 0 -0.6510 -0.7090 -0.6114 -0.1210 -0.4323 -0.5700
10 300,000 0.3 0.010 0 -0.2865 0.0660 -0.1810 2.1525 1.0570 1.5284
11 30,000 0.8 0.010 0 -0.6611 -0.5806 -0.6618 1.1235 -0.2887 0.2525
12 300,000 0.8 0.010 0 -0.1029 0.2029 -0.0318 2.0116 1.0700 1.4572
13 30,000 0.3 0.001 0 -0.2009 0.7957 0.3071 2.5368 0.9833 1.6376
14 300,000 0.3 0.001 0 0.0535 0.4545 0.8911 1.3077 0.7226 0.9414
15 30,000 0.8 0.001 0 -0.0067 0.6215 0.9349 2.1777 1.2103 1.6674
16 300,000 0.8 0.001 0 0.1373 0.3316 0.4154 0.7637 2.0854 1.1382
17 30,000 0.3 0.010 1 -0.6691 -0.5763 -0.6516 0.2659 -0.8541 3.4418
18 300,000 0.3 0.010 1 -0.2115 0.2017 -0.0248 2.3109 1.1186 1.5671
19 30,000 0.8 0.010 1 -0.6191 -0.4023 -0.4801 1.2340 -0.0312 0.6310
20 300,000 0.8 0.010 1 -0.0730 0.2276 0.0080 1.8310 1.0043 1.3454
21 30,000 0.3 0.001 1 -0.1434 0.6273 0.4781 2.4055 1.0646 1.4649
22 300,000 0.3 0.001 1 0.0446 0.3751 0.9598 1.2718 0.6395 0.7813
23 30,000 0.8 0.001 1 0.0154 0.6661 1.4321 1.7884 1.0911 1.3269
24 300,000 0.8 0.001 1 0.1938 0.2325 0.4878 0.8037 4.3754 1.4777

\(~\) \(~\) \(~\) \(~\)

Relative change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\):

3. Evaluating methods using a significance threshold of \(5 \times 10^{-4}\)

Similar to part 2 above, we use the code detailed in norm_5e-4_10sim.R with a total of 10 simulations in order to evaluate six different Winner’s Curse methods across each of the 24 scenarios. The same three bias evaluation metrics are considered.

Summary of results contained in norm_5e-4_flb_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 0.9454 0.9449 0.9724 0.9695 0.9908 0.9810
2 300,000 0.3 0.010 -1 0.5998 0.4608 0.5225 0.6079 0.6104 0.6041
3 30,000 0.8 0.010 -1 0.7565 0.7121 0.7531 0.7961 0.8548 0.8243
4 300,000 0.8 0.010 -1 0.5487 0.3505 0.4950 0.5632 0.5541 0.5527
5 30,000 0.3 0.001 -1 0.7917 0.7348 0.7658 0.8377 0.8404 0.8363
6 300,000 0.3 0.001 -1 0.6661 0.4972 0.6065 0.7068 0.6953 0.7118
7 30,000 0.8 0.001 -1 0.7078 0.6059 0.6425 0.7589 0.7546 0.7597
8 300,000 0.8 0.001 -1 0.6461 0.4438 0.6091 0.6983 0.6785 0.6876
9 30,000 0.3 0.010 0 0.9257 0.9170 0.9541 0.9556 0.9836 0.9656
10 300,000 0.3 0.010 0 0.5921 0.4611 0.5253 0.6127 0.6076 0.6077
11 30,000 0.8 0.010 0 0.7472 0.6949 0.7264 0.7870 0.8369 0.8084
12 300,000 0.8 0.010 0 0.5487 0.3547 0.5040 0.5672 0.5561 0.5614
13 30,000 0.3 0.001 0 0.7988 0.7340 0.7680 0.8391 0.8475 0.8430
14 300,000 0.3 0.001 0 0.6666 0.5070 0.6019 0.7161 0.6998 0.7102
15 30,000 0.8 0.001 0 0.7119 0.6147 0.6526 0.7682 0.7640 0.7697
16 300,000 0.8 0.001 0 0.6521 0.4562 0.6156 0.6935 0.6720 0.6790
17 30,000 0.3 0.010 1 0.8997 0.8978 0.9285 0.9448 0.9684 0.9548
18 300,000 0.3 0.010 1 0.5944 0.4521 0.5166 0.6140 0.6135 0.6090
19 30,000 0.8 0.010 1 0.7288 0.6709 0.7072 0.7641 0.8077 0.7797
20 300,000 0.8 0.010 1 0.5524 0.3568 0.5041 0.5718 0.5594 0.5648
21 30,000 0.3 0.001 1 0.7987 0.7440 0.7694 0.8492 0.8467 0.8481
22 300,000 0.3 0.001 1 0.6791 0.5203 0.6215 0.7270 0.7234 0.7250
23 30,000 0.8 0.001 1 0.7313 0.6308 0.6758 0.7742 0.7757 0.7785
24 300,000 0.8 0.001 1 0.6465 0.4756 0.6204 0.7071 0.6843 0.7013

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-4}\):

\(~\) \(~\) \(~\) \(~\)

Summary of results contained in norm_5e-4_mse-change2_10sim.csv:

Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 -0.001804 -0.001801 -0.001738 -0.001627 -0.001359 -0.001496
2 300,000 0.3 0.010 -1 -0.000020 -0.000011 -0.000015 -0.000012 -0.000018 -0.000016
3 30,000 0.8 0.010 -1 -0.000897 -0.000889 -0.000896 -0.000799 -0.000779 -0.000826
4 300,000 0.8 0.010 -1 -0.000010 -0.000001 -0.000006 -0.000004 -0.000007 -0.000007
5 30,000 0.3 0.001 -1 -0.001187 -0.001180 -0.001078 -0.001162 -0.000976 -0.001079
6 300,000 0.3 0.001 -1 -0.000077 -0.000069 -0.000057 -0.000071 -0.000062 -0.000068
7 30,000 0.8 0.001 -1 -0.000895 -0.000824 -0.000722 -0.000876 -0.000762 -0.000846
8 300,000 0.8 0.001 -1 -0.000066 -0.000065 -0.000054 -0.000068 -0.000047 -0.000060
9 30,000 0.3 0.010 0 -0.001741 -0.001753 -0.001602 -0.001492 -0.001276 -0.001402
10 300,000 0.3 0.010 0 -0.000023 -0.000022 -0.000021 -0.000022 -0.000023 -0.000023
11 30,000 0.8 0.010 0 -0.000849 -0.000875 -0.000815 -0.000802 -0.000736 -0.000834
12 300,000 0.8 0.010 0 -0.000013 -0.000008 -0.000010 -0.000009 -0.000012 -0.000011
13 30,000 0.3 0.001 0 -0.001291 -0.001264 -0.001195 -0.001171 -0.001000 -0.001160
14 300,000 0.3 0.001 0 -0.000078 -0.000070 -0.000057 -0.000072 -0.000065 -0.000072
15 30,000 0.8 0.001 0 -0.000979 -0.000940 -0.000847 -0.000931 -0.000786 -0.000888
16 300,000 0.8 0.001 0 -0.000065 -0.000065 -0.000054 -0.000067 -0.000043 -0.000059
17 30,000 0.3 0.010 1 -0.001656 -0.001708 -0.001548 -0.001517 -0.001213 -0.001378
18 300,000 0.3 0.010 1 -0.000024 -0.000023 -0.000022 -0.000024 -0.000024 -0.000026
19 30,000 0.8 0.010 1 -0.000828 -0.000830 -0.000795 -0.000802 -0.000677 -0.000753
20 300,000 0.8 0.010 1 -0.000014 -0.000012 -0.000012 -0.000013 -0.000014 -0.000015
21 30,000 0.3 0.001 1 -0.001351 -0.001330 -0.001184 -0.001192 -0.000998 -0.001148
22 300,000 0.3 0.001 1 -0.000083 -0.000076 -0.000063 -0.000081 -0.000067 -0.000076
23 30,000 0.8 0.001 1 -0.001087 -0.001002 -0.000889 -0.001002 -0.000821 -0.000902
24 300,000 0.8 0.001 1 -0.000072 -0.000071 -0.000057 -0.000073 -0.000041 -0.000065

\(~\) \(~\) \(~\) \(~\)

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-4}\):

\(~\) \(~\) \(~\) \(~\)

TO DO: Simulate the third bias evaluation metric for this significance threshold to obtain norm_5e-4_mse-change_10sim.csv. Plot in a similar manner to that above.

\(~\) \(~\) \(~\) \(~\)

4. Skewed distribution of effect sizes

Here we investigate the 24 different scenarios under a skewed distribution of effect sizes. In order to create a bimodal distribution, we simulate 50% of effect sizes of the true effect SNPs from a normal distribution centered at 0 while the other half are generated from a normal distribution with mean 2.5. As above, we first have a look at the expected number of significant SNPs and the expected proportion of those in which their association estimate is exaggerated.

Running the code provided in nsig_prop_bias_100sim.R, we obtain the following results:

Scenario n_samples h2 prop_effect S n_sig 5e-8 sd(n_sig) 5e-8 prop_bias 5e-8 sd(prop_bias) 5e-8
1 30,000 0.3 0.010 -1 0.70 0.916 0.5400 0.5009
2 300,000 0.3 0.010 -1 851.27 18.889 0.7649 0.0130
3 30,000 0.8 0.010 -1 27.10 5.361 0.9880 0.0211
4 300,000 0.8 0.010 -1 2793.70 32.045 0.6260 0.0094
5 30,000 0.3 0.001 -1 85.32 5.255 0.7610 0.0444
6 300,000 0.3 0.001 -1 575.74 12.120 0.5517 0.0202
7 30,000 0.8 0.001 -1 279.53 9.591 0.6264 0.0279
8 300,000 0.8 0.001 -1 729.53 11.935 0.5272 0.0188
9 30,000 0.3 0.010 0 0.52 0.810 0.5200 0.5021
10 300,000 0.3 0.010 0 896.33 19.033 0.7538 0.0125
11 30,000 0.8 0.010 0 28.61 4.752 0.9917 0.0170
12 300,000 0.8 0.010 0 2764.21 29.531 0.6165 0.0084
13 30,000 0.3 0.001 0 90.29 6.253 0.7590 0.0473
14 300,000 0.3 0.001 0 546.47 11.184 0.5485 0.0224
15 30,000 0.8 0.001 0 276.46 10.923 0.6153 0.0303
16 300,000 0.8 0.001 0 702.71 12.246 0.5255 0.0171
17 30,000 0.3 0.010 1 0.65 0.744 0.3800 0.4878
18 300,000 0.3 0.010 1 913.62 19.852 0.7889 0.0122
19 30,000 0.8 0.010 1 20.87 4.341 0.9989 0.0080
20 300,000 0.8 0.010 1 2959.28 27.593 0.6104 0.0084
21 30,000 0.3 0.001 1 91.49 5.718 0.7817 0.0415
22 300,000 0.3 0.001 1 534.82 9.277 0.5417 0.0216
23 30,000 0.8 0.001 1 296.23 9.175 0.6126 0.0283
24 300,000 0.8 0.001 1 658.63 11.636 0.5258 0.0174

\(~\) \(~\) \(~\) \(~\)

Next, we repeat the process illustrated in Section 2 using the same bias evaluation metrics with a significance threshold of \(5 \times 10^{-8}\).

Summary of results contained in skew_5e-8_flb_10sim.csv:
Scenario n_samples h2 prop_effect S EB FIQT BR cl1 cl2 cl3
1 30,000 0.3 0.010 -1 0.4000 0.4000 0.5000 0.3500 0.6000 0.4000
2 300,000 0.3 0.010 -1 0.6551 0.4493 0.6239 0.5250 0.4993 0.5099
3 30,000 0.8 0.010 -1 0.9219 0.8157 0.8430 0.5993 0.7826 0.6696
4 300,000 0.8 0.010 -1 0.5642 0.2830 0.5399 0.5135 0.4877 0.4946
5 30,000 0.3 0.001 -1 0.6407 0.3385 0.5081 0.5129 0.5228 0.5039
6 300,000 0.3 0.001 -1 0.4893 0.1320 0.3733 0.5001 0.4867 0.5017
7 30,000 0.8 0.001 -1 0.5348 0.2346 0.3760 0.5194 0.4815 0.4969
8 300,000 0.8 0.001 -1 0.4728 0.0979 0.4023 0.5118 0.4724 0.4949
9 30,000 0.3 0.010 0 0.5000 0.2000 0.6500 0.5000 0.6000 0.2000
10 300,000 0.3 0.010 0 0.6371 0.4376 0.6127 0.5294 0.5011 0.5043
11 30,000 0.8 0.010 0 0.9168 0.7756 0.9011 0.6288 0.7657 0.6856
12 300,000 0.8 0.010 0 0.5576 0.2742 0.5339 0.5148 0.4895 0.4931
13 30,000 0.3 0.001 0 0.6110 0.3007 0.5274 0.5337 0.5013 0.5006
14 300,000 0.3 0.001 0 0.5028 0.1378 0.3798 0.5044 0.4790 0.4923
15 30,000 0.8 0.001 0 0.5269 0.2025 0.3654 0.5132 0.4802 0.5086
16 300,000 0.8 0.001 0 0.4787 0.1036 0.3968 0.5069 0.4732 0.4915
17 30,000 0.3 0.010 1 0.4000 0.3000 0.3000 0.4000 0.7000 0.2000
18 300,000 0.3 0.010 1 0.6663 0.4792 0.6488 0.5359 0.4966 0.5105
19 30,000 0.8 0.010 1 0.9172 0.8982 0.9400 0.7142 0.9132 0.7743
20 300,000 0.8 0.010 1 0.5511 0.2637 0.5385 0.5118 0.4811 0.4910
21 30,000 0.3 0.001 1 0.6622 0.3305 0.5850 0.5578 0.4967 0.5014
22 300,000 0.3 0.001 1 0.4956 0.1213 0.3791 0.5114 0.4672 0.5027
23 30,000 0.8 0.001 1 0.4926 0.1866 0.3482 0.5037 0.4856 0.4930
24 300,000 0.8 0.001 1 0.4799 0.0999 0.4081 0.5057 0.4682 0.4987

\(~\) \(~\) \(~\) \(~\)

Fraction of significant SNPs less biased due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

\(~\) \(~\) \(~\) \(~\)

TO DO:

Summary of results contained in skew_5e-8_mse-change2_10sim.csv:

Change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

Summary of results contained in skew_5e-8_mse-change_10sim.csv:

Relative change in average MSE over all significant SNPs due to method implementation, using a significance threshold of \(5 \times 10^{-8}\) and a skewed distribution of effect sizes:

>>>>>>> 9ac803d0b0cc90ce40ff762f59bd71bd3efb3a0a